Asynchronous Two-level Checkpointing Scheme for Large-scale Adjoints in the Spectral-element Solver Nek5000
نویسندگان
چکیده
Adjoints are an important computational tool for large-scale sensitivity evaluation, uncertainty quantification, and derivative-based optimization. An essential component of their performance is the storage/recomputation balance in which efficient checkpointing methods play a key role. We introduce a novel asynchronous two-level adjoint checkpointing scheme for multistep numerical time discretizations targeted at large-scale numerical simulations. The checkpointing scheme combines bandwidth-limited disk checkpointing and binomial memory checkpointing. Based on assumptions about the target petascale systems, which we later demonstrate to be realistic on the IBM Blue Gene/Q system Mira, we create a model of the expected performance of our checkpointing approach and validate it using the highly scalable Navier-Stokes spectralelement solver Nek5000 on small to moderate subsystems of the Mira supercomputer. In turn, this allows us to predict optimal algorithmic choices when using all of Mira. We also demonstrate that two-level checkpointing is significantly superior to single-level checkpointing when adjoining a large number of time integration steps. To our knowledge, this is the first time two-level checkpointing had been designed, implemented, tuned, and demonstrated on fluid dynamics codes at large scale of 50k+ cores. Keywords-Two-Level Checkpointing; Adjoints; Gradient; Large Scale; Nek5000 ; CFD; PETSc
منابع مشابه
Towards adaptive mesh refinement for the spectral element solver Nek5000
When performing computational fluid dynamics (CFD) simulations of complex flows, the a priori knowledge of the flow physics and the location of the dominant flow features are usually unknown. For this reason, the development of adaptive remeshing techniques is crucial for large-scale computational problems. Some work has been made recently to provide Nek5000 with adaptive mesh refinement (AMR) ...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملTowards adaptive mesh refinement in Nek5000
The development of adaptive mesh refinement capabilities in the field of computational fluid dynamics is an essential tool for enabling the simulation of larger and more complex physical problems. While such techniques have been known for a long time, most simulations do not make use of them because of the lack of a robust implementation. In this work, we present recent progresses that have bee...
متن کاملFluid-structure interaction simulation of floating structures interacting with complex, large-scale ocean waves and atmospheric turbulence with application to floating offshore wind turbines
We develop a numerical method for simulating coupled interactions of complex floating structures with large-scale ocean waves and atmospheric turbulence. We employ an efficient large-scale model to develop offshore wind and wave environmental conditions, which are then incorporated into a high resolution two-phase flow solver with fluid-structure interaction (FSI). The large-scale wind-wave int...
متن کاملEnhanced Two-level Fault Recovery Scheme Combined with Message Logging
⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016